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Abstract — Training over sparse multipatli noisy cliannels is 
explored. The energy allocation and the optimal shape of training 
signals that enable communications over unknown channels 
are characterized as a function of the channels' statistics. The 
performance of training is evaluated by the reduction of the 
mean square error of the channel estimate and by the decrease 
in the penalty term- the mutual information reduction due 
to the uncertainty of the channel. The performance of low 
dimensional training signal is compared to the performance of 
a full dimensional one. Especially, The trade-off between the 
number of required measurements (signal dimensions) and the 
energy allocation is calculated, and it is proven that if the signal 
to noise ratio of the received training signal is low, reducing the 
number of channel measurements using compressed sensing is 
efficient in the sense of energy consumption. 

I. Introduction 

Channel statistics determine the incoherent achievable 
rate [1], [2], [3]. If we transmit x over a noisy random LTI 
channel denoted by the random impulse response h, and white 
Gaussian noise z is added such that the received signal is 
y = h * X + z, where * denotes convolution, then the mutual 
information between y and x obeys 



/(y;x) = /(y;x,h)-/(y;h|x) 
> 7(y;x|h)-7(y;h|x) 



(1) 



The second form term of (1) /(y; h|x) is the penalty term due 
to the uncertainty of the channel h. The mutual information 
between y and x is lower bounded by coherent rate minus the 
penalty term. The penalty term is a function of the statistics of 
h, the 'richer' is the statistics of h i.e. the bigger the entropy 
of h, the higher the penalty term. 

The statistics of the channel affect its entropy, but what 
is their effect on the best way to train the system? This 
paper is concentrated on training over the sparse multipath 
channel. This channel can be considered as a collection of 
narrowband eigenchannels, with no interference between them. 
Each eigenchannel amphfies the transmitted data by a gain and 
as a result of channel sparsity, there is a dependence between 
the gains of the eigenchannels, dependence that causes the low 
uncertainty of the channel. 

The performance of training can be evaluated from two 
perspectives: the minimum mean square error (MMSE) in 
estimating the channel and the reduction in the penalty term. 



By compressed sensing, one can divide the signal in the 
frequency domain to a data part and a training part. 
Recovering sparse vectors in noisy environments using thresh- 
olding is discussed in [4J. The idea of recovering sparse 
vectors after compressing them is introduced in [5], [6] and 
this abihty was extended to the noisy case [7]. Recent works 
are concentrated on the ability of exact pattern recovery [8], 
[9], [10], i.e. the abihty to delect almost always all the non- 
zero entries of the vector h which represents the channel. 
[11 J discusses compressed sensing of vectors and [12J discuss 
compressed sensing of channels in the finite SNR regime. 
A connection between information theory and compressed 
sensing is introduced in [13]. This work bounds the number of 
required measurements (the rows' rank of the the compressing 
matrix) needed to reduce the mean square error of the com- 
pressed random vector v to a value of ?7 e TZ^ in a noisy 
environment: 

7^v(r/) 



m > 



\ log(l + SNR) 



(2) 



where TZv{r]) is the rate distortion function of v at the point 

77 and i log(l + SNR) is the capacity of an AWGN channel. 
From (2) the total energy of compressing/training is lower 
bounded by 

mSNR > 211^(7]) (3) 

However, it is not clear whether the bounds (2) and (3) are 
achievable. 

Our Contribution: In accordance with the physical char- 
acteristics of multipath channels in the wideband limit we 
assume that the sparsity of the channel tends to zero and that 
the channel remains constant during short coherence periods. 
Unlike papers that discuss finite or high SNR [11], [8], this 
paper is concentrated in training in the low SNR regime, where 
sparsity enables recovery of the channel. We design a signal 
composed of a data part and a training part such that the 
output can be separated into two parts that do not interfere, 
and training uses as small a subspace as possible so the data 
space is maximized. 

[9], [10] look for exact pattern recovery, and their results 
do not achieve the lower bounds (2) (3). We show that in 
the low SNR regime, if one is satisfied with almost perfect 
channel recovery then using techniques of compressed sensing 
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TABLE I 

A COMPARISON BETWEEN THE NUMBER OF REQUIRED MEASUREMENTS 
AND TOTAL TRAINING/COMPRESSING ENERGY IN ORDER TO REDUCE THE 

MEAN SQUARE ERROR FROM FLETCHER ET AL. [9] AND FROM THIS 
PAPER. The RANDOM CHANNEL h IS UNIT NORM, ITS LENGTH IS fee AND 

THE NUMBER OF NON-ZERO ENTRIES IS L. UNLIKE [9] THAT SEARCH 
FOR PERFECT RECOVERY, WE ALLOW NEGLIGIBLE MEAN SQUARE ERROR 
THAT ENABLES THE REDUCTION OF THE TRAINING ENERGY BY A FACTOR 
OF AT LEAST FOUR. 



the lower bound on the number of require measurements (2) is 
achievable while using minimum training energy (3), as long 
as the training signal is composed of enough harmonic vectors 
and channel measurements are done in the frequency domain. 
In addition we evaluate the effect of training on the penalty 
term. 

A comparison of the required training energy and number of 
channel measurements between this paper and [9] is presented 
in Table I. 

II. Channel model and training signals 

The model assumes that the channel remains constant during 
a period of tc, and the maximal delay is td, where td is sig- 
nificantly smaller than tc and both are constants independent 
of the bandwidth. As the bandwidth increases the number 
of delayed reflections of the transmitted signal also grows, 
however the growth is sublinearly with the bandwidth. The 
receiver gets the signal with additive white Gaussian noise 
independent of the transmitted signal and the channel. After 
discretizing, the channel can be represented by the vector h 
of length kc = w X tc, where w is the bandwidth, the last 
w{tc — td) entries are zero and only the first kd = w x td 
entries may differ from zero. 

A. Channel model 

The statistics of each path delay are as follows: each of 
the first kd entries of h is an active path in probability 

independent of the other entries, and Ymi^^oo = so 
the channel becomes sparser as the bandwidth increases. The 
statistics of the ampUtudes are also independent and denoted 
by the probabiUty density fimction V{-). We assume that 
E [h] = and S ||h||2 = 1 such that each amplitude of an 



active path has a zero mean and variance Since kd is 

significantly smaller than kc, we approximate the result of 
the LTI channel by a cyclic convolution and get the received 
training signal ytrain = VSNRxtrain * h + Ztrain where 
* denotes the cyclic convolution, Ztrain is additive white 
Gaussian noise and SNR is the signal to noise ratio. 
Let x,iata be the data part of the transmitted signal. If we trans- 
mit and train concurrently then y = ^VSNRxtrain + xjata) * 



h + z. The realization of the channel depends on the pdf of the 
path gains V. We pay special attention to the following cases: 
(1) the statistics of the path gains are Gaussian and (2) the 
active path gains equal 



in probabiUty \ and — 



y/~C(w) 



in probability 5 so the absolute value of the gains is constant. 
In the Gaussian case we replace h by hgaussian and in the 
constant (absolute value) case we replace h by hconstant- 
Hence, in each general formula contains h, a subscript may be 
used to denote the type of che channel (constant or gaussian). 

B. Training signals 

We introduce two training signals: 

• Ximpuise that uses all the transmission space (i.e. all the 

eigenchannels get training energy). 

• ^frequency that uscs Only part of the transmission space. 
1) Impulse probing: Impulse probing means sending a 

pulse over the channel h to get its noisy impulse response. 
The impulse training signal is: 

' kr 



i = 1 
otherwise 



(Ximpulse)i — ■^ q 

^impulse we get the received training 

(4) 



Training with Xtraii 
signal: 

yimpulse 

2) Training in the frequency domain: This type of training 
uses fewer measurements and enables to divide the band to a 
data and training band. The eigenvectors of the LTI channel 
are the harmonic vectors. The i'th fcc-length harmonic vector 
f at the fc'th position obeys: 

1 27rj«(fc-l) 

— e fee 1 = 1,2, fee 



f (0 _ 



The training signal is chosen randomly in the following way: 
Let Q be a m-random subset of {1, 2, fee}. The training 

signal Xfreqi^ency is. 



^frequency 



(5) 



m ^—^ 

Training using Xfrcqucncy we get: 

ytrain — VSNRXfrgquency * h + Ztrain 

Let zi,i2,...,im be the elements of Q and let Aj, 
1,2. 

resented by h, i.e 

VSNRXfi-oquency * h + Ztrain (7) 



(6) 

,im DC tne elements oi y ana let Aj, i = 
,im be the eigenvalues of the cyclic convolution rep- 
h * f 
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ySNRA/-Ai,f(^^)+ Ztrain (8) 

V m ' 



Let F be a matrix whose rows are the harmonic vectors 
corresponding to Q. Projecting ytrain onto F and using the 
orthogonahty of harmonic vectors we get the vector yfrequency^ 



yfrequency -^ytrain a/SNR 



A,; 



(9) 



V ^im J 



where z^^^^^ is white Gaussian noise with unit norm. The con- 
volution (6) is equivalent to projecting h onto the compressing 
matrix F. Since E [A|] = 1, SNRfrequencyj the signal to noise 

ratio of yfrequency is 



SNRfr, 



= ^SNR 
m 



(10) 



We can now compare training by impulse probing to com- 
pressed training in the frequency domain: in both cases (4) 
and (6) the total energy of training is SNR ||ximpuise||2 = 
SNRjIxfrcqucncylla = SNRfcc Howcver, Yimpuiso, the received 
training signal of Ximpuise is fcc-length with signal to noise 
ration SNR while yfrequency is m-length with signal to noise 
ratio ^SNR. 



III. Performance of Ttraining 



A. Minimum mean square error 

ninimum 
l|h-i;[h|yt: 



The minimum mean square error of h given ytrain is 
• Let ytrain (SNR) = v^NRxtr^iT, * 



train] II2 



E 

h + ztrain- The minimum mean square error of h given yt; 
as a function of SNR is 
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mmse(SNR) =E \\\i-E [h|ytrain(SNR)]||^ 



(11) 



Obviously, the higher the SNR the smaller the minimum 

mean square error so (11) monotonically decreases. Later (in 
Theorem 1) we see that the curve of the function behaves as 
a decreasing step function. 

B. Penalty term and rate distortion function 

An alternative way to quantify the performance of training 
is to evaluate the uncertainty of the channel after training. 
We introduce two such similar criterias: penalty term and rate 
distortion function. Section IV shows that using a low training 
energy, minimum mean square error is not reduced although 
the penalty term and the rate distortion function are strongly 
affected. 

1 } Rate distortion function: Let ?7o be a small (negUgible) 
positive number. The rate distortion function TZhivo) quan- 
tifies the amount of information required to almost perfectly 
recover the channel. If we have already trained the system, 
the remaining amount of information required to recover 
the channel is reduced T^-hiytrainl*) < ^h('yo)- The rate 
distortion function without training can be approximated by 



^h(%) « (l+o(l))A;d X Hb 



(12) 



when Hbi-) is the binary entropy function. (12) is justified be- 
cause the information required for an approximate recovery of 
h is a discrete /cd-length vector which contains the information 
on the path delays plus JC{w) variables that contain data about 
the path gains. However, the required information on the path 
gains is neghgible relative to the required information on the 



path delays (see [2]). Let 7^|7°^(SNR) be the rate distortion 
function after training as a function of SNR. 

7e(:°)(SNR) = 7^h|y,,.,„(SNR)(%) 

'c{wy 



< (1 + 0(l))fcd X 
- /(ytrain(SNR);h) 



(13) 



A comparison between the rate distortion function and the 
minimum mean square error after training is possible by 
comparing Figure (a) to Figure (b). 

2) Penalty term: The penalty term, the reduction in mutual 
information due to the uncertainty of channel, is the mutual 
information between the received data signal ydata and the 
channel h. Under resonable assumptions on the data and train- 
ing signals, the penalty term equals /(ydata! h|ytrain 

(SNR)) 

and is upper bounded by (13): 



/ (ydata; h|xdata,yt rain 

(SNR)) 
'C(w) 



(14) 



< {1 + oil))kd X Hb 
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C. Optimization 

Optimization of the training is done over the number of 
required measurements and the energy consumption, so we 
want to minimize the energy of xtrain and when training in 
the frequency domain also the number of harmonic vectors 
composing Xfrequency From [13] we know that the number 
of required measurements m for neghgible minimum mean 



square error r/o is lower bounded by m > Tiog(i+sNR) 
so the required energy is lower bounded by mSNR > 
^^^^n^iTsNR) - 2^h(??o). The following section show 
that these boimds are achievable. 

IV. Training by impulse probing 
A. Minimum Mmean square error and rate distortion function 

of Inconstant 

1 ) Minimum mean square error: Let e be a positive number 
as small as we wish and let 



SNRo 



2kdnb{^) 



(15) 



The following theorem shows the effect of the training energy 
on the mean square error of channel recovery: 
Theorem 1: If the total training energy is at least 



(1 + e)fceSNRo 



(16) 



then the minimum mean square error of hconstant is o(l) in the 
wideband limit. On the other hand, if the total training energy 
is less than (16) then the asymptotic mean square error of 

'^constant 

is 1 - o(l). 

Sketch of proof: Let Tthrcshoid = \J -^S'HRq. The proof is 
based on he fact that only a {C{w)) noise terms are high such 
that |(ztrain)il > ^threshold but as loug as the training energy 
is higher than (16), almost every (ytrain)^ corresponding to 
an active path obeys | (ytrain) J > Ttiireshoid so recovery is 



almost perfect and negligible minimum mean square error is 
achievable. 

On the other hand, if we use a little less training energy 
than (16), then the |(ytrain)irs corresponding to active paths 
do not achieve the threshold Tthreshoid. and there are much 
more than jC{w) noise terms that are bigger than most of the 
|(ytrain)jrs whosc Origins are active paths so random noise 
terms are more likely to look like active paths than the actual 
ones. As a result, any estimator cannot decide whether the 
origin of (ytrain)^ is an active path or a noise term and the 
estimation completely fails. 

Interpertation of Theorem 1: This theorem in fact shows 
that the required training energy for almost perfect channel 
recovery asymptotically achieves the lower bound (3). To see 
this remember from (3) and (12) that the required training 
energy to recover the channel is lower bounded by 



27^h(??o) = 2(1 + o(l))A:d X -Hfc 



(17) 



Combining (15), (16) and (17), the training energy of Theo- 
rem 1 achieves the lower bound on training energy (3), because 
e in (16) is as small as we wish. 

The mean square error as a function of SNR behaves ap- 
proximately as a step function, because the mean square error 
of h is 1 - o(l) if SNR < (1 - e)SNRo and o(l) if SNR > 
(l+e)SNRo. The reduction in the minimum mean square error 
occurs in the interval [(1 - e)SNRo, (1 + e)SNRo], which is 
as small as we wish. 

B. Mean square error, penalty term and rate distortion func- 
tion of hconstant 

Although training with limited energy may be inefficient 
in the sense that it does not reduce the mean square error, it 
does affect the penalty term. Using the I-MMSE connection 
we conclude from Theorem 1: 

Corollary 2: The penalty term of hconstant ^tfter train- 
ing (14) is upper bounded by: 

I (ydata! hconstant |Xdata, ytrain(SNR)) 

'C{w) 



< {i + o{i))kdxnb 



kd 



(l + o(l))fcdm 

(1 + 0(l))fcrf-^6 

ifccSNR 



kd 
£M 

kd 



-''(ytrainih) 

s=SNR 



2A=o 



mm.se{s)ds 



SNR < (1 - e)SNRo 



{l-o{l))kdUb(^) SNR > (1 + e)SNRo 



(18) 



Interpretation: Since the mean square error is a step function 

of SNR, the mutual information between ytrain and hconstant 
increases hnearly when recovery fails and remain constant 
when recovery is almost perfect. As a result the penalty term 
decreases linearly to a negligible value. 



C. Mean square error, penalty term and rate distortion func- 
tion of hgaussian 

The abihty to detect a path delay depends on its gain's 
impulsivity. Like in Theorem 1, training can detect with high 

probability the delays of active paths of hgaussian as long 



(,ytr 



),.| = \/SNR\/fc;/l, + (Ztrain), 



threshold 



We 



begin with a theorem summarizing the results of estimating 



hgaussian 



and then compare them to the hconstant case. 



Theorem 3: Let Q(-) be the cummultive density function 
of normal random variable. The minimum mean square error 



of h, 



gaussian 



as a function of SNR obeys: 



mmse(SNR) 



/SNRo 

J s= 



/ SNRq 
V SNR 



(19) 



The proof is omitted. 

Using the I-MMSE connection, we get the following corol- 
lary regarding the penalty term of the estimate of hgaussian 

Corollary 4: The penalty term (14) of hgaussian is upper 
bounded by: 



I (ydata; hgaussian I Xdata) ytrain(SNR)) 



(l + o(l))fcrf X Hb 



s=SNR 



2 .ls=a 



(20) 

mmse(s)ds 



The penalty term (4) does not decrease hnearly as in the 
hconstant casc, but in a strictly convex manner, see Figure (b). 
Interpretation of Theorem 3 and Corollary 4 

1) The performance of training in terms of minimum mean 
square error of hgaussian as SNR < (1 — e)SNRo is 
better than training over hconstant (compare Theorem 1 
to Theorem 3). However, as SNR > (1 + e)SNRo 
training hconstant yields better results. Anyway, in terms 
of penalty term training over hconstant is more efficient 
at any SNR, see Figure (a) and Figure (b). 

2) The performance of training hgaussian depends on the 
impulsivity of the path gains, and is not due to their 
uncertainty. If the path gains where Gaussian and known, 
the asymptotic results where identical to results over 
hgaussian although iu the hgaussian modcl the ampUtudes 
are not known. 

3) The mean square error, unhke the penalty term, is very 
sensitive to the extreme noise values. Since modeling 
physical noise as white Gaussian relates to the average 
case, it is interesting to measure the behavior of the 
extreme case of the physical noise in multipath chan- 
nels. Note that the extreme case 'captures' a very low 
percentage of the probability mass and the power of the 
noise. 

V. Training in the frequency domain 

When training in the frequency domain (with the training 
signal Xfrequency) wc usc Only part of the available band 
for training and leave the rest of the band to transmit data. 
This section shows the conditions where the lower bound 
on training energy for almost perfect channel recovery (3) 



is achievable despite the reduction in the band allocated for 
training. 

The main theorem of this section is based on the 'restricted 
isometry property' of matrices defined in [5], [6] and on the 
fact that the compressing matrix F (see Section II-A) whose 
rows are the m harmonic vectors composing Xfioquoncy obeys 
with very high probability [14] [15] the restricted isometry 
property for 2£(w)-sparse vectors with as small parameter as 
we wish, if the number of rows of F obeys: 

m > O [C{w) log kc ^og^ C{w)) 

= O(7^h.„(r;o)log^ (21) 

where the equality (21) is based on explicit evaluation of 
^h.„3u„t (^70) in (12). 

Recall that SNRficqucncy is the signal to noise ratio of 
the m channel measurements. The following theorem shows 
when the channel measurements and the training energy can 
be minimized together 

Theorem 5: If the total training energy is at least (1 + 
e)fccSNRo (i.e. SNRfrequency > (1 + e)^SNRo) and m, 
the number of harmonic vectors composing the training sig- 
nal Xficqucncy obcys (21), then the mean square error of 
hconstant is o(l) in the wideband limit. On the other hand, 
if the total training energy is less than (1 — e)A:cSNRo 
(i.e. SNRfroqucncy < (1 " e)^SNRo) then the mean square 
error of hconstant is 1 - o(l). 

Interpretation: As long as the training signal Xfroquoncy (5) 
is composed of enough harmonic vectors, such that the cor- 
responding matrix F obeys the restricted isometry property 
with a very low parameter, the performance of training is 
asymptotically the same as training by impulse probing with 
the same total amount of energy. Equation (21) shows that if 
the number of channel measurements is in order of magnitude 
of the rate distortion function TJ-hoonstant ('?o) multiplied by 
log"^ >C(w), then recovery is possible using minimum training 
energy (3). Can we reduce the number of measurements further 
and still achieve minimum training energy? By [15] it is 
known that if the compressing matrix was i.i.d. Gaussian, the 
condition on m is 

m»7eh_,.„,(77o) (22) 

so for an i.i.d. gaussian matrix the only condition required to 
achieve minimum training energy is that rn is a superlinear 
function in "^hconstant (^o)- In the case of F, where the rows 
of h are harmonic vectors, we don't know whether the 
condition (21) can be improved. 

Using Theorem 5, training in the frequency domain yields a 
corollary similar to Corollary 2 and a theorem and corollary 
similar to Theorem 3 and Corollary 4 while using the same 
total amount of training energy over hgaussian- 

VI. Summary 

This paper evaluated the performance of training over 
hgaussian and hconstant in the low SNR regime. Training over 
hconstant achicvcs the lower bound on training energy for 
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(a) Minimum mean square er- (b) Tlie ratio between the rate 
ror of hconstant (upper curve) distortion function after train- 

VS. llgaussian 

(lower curve). Us- ing Tej^''^' (SNR) and the initial 
ing very low SNR, training over rate distortion function Tihivo) 
hgaussian is more efficient. of hconstant (lower curve) vs. 

hgaussian (upper curve). In any 
SNR, training hconstant yields 
better results. 



almost perfect recovery. Moreover, recovery using minimal 
training energy is possible even using much fewer measure- 
ments that the length of the sparse vector. While training with 
an energy even shghtly below fccSNRo, the minimum mean 
square error does not decrease at all, but the penalty term and 
the rate distortion function are strongly affected. 
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